Bioinformatics A Practical Guide to Next Generation Sequencing Data Analysis (Hamid D. Ismail)

RNA-Seq Data Analysis ◾ 193

significant. The distance between the normal samples and the tumor sample is about 2

log2-fold change on the x-axis or 4 folds.

We can also use heatmap to cluster the most variable genes in the samples. We expect

that some samples may have similar pattern depending on the given condition (normal

or tumor). The following heatmap script will describe the relationships between samples

using hierarchical clustering:

install.packages(“gplots”)

library(“gplots”)

png(file=”heatmap1.png”)

logcountsNorm <- cpm(yNorm,log=TRUE)

var_genes <- apply(logcountsNorm, 1, var)

select_var <- names(sort(var_genes, decreasing=TRUE))[1:10]

highly_variable_lcpm <- logcountsNorm[select_var,]

mypalette <- brewer.pal(11,”RdYlBu”)

morecols <- colorRampPalette(mypalette)

col.con <- c(rep(“purple”,3),

rep(“orange”,3))[factor(sampleinfo$condition)]

heatmap.2(highly_variable_lcpm,

col=rev(morecols(50)),trace=”none”,

main=”Top 10 most variable genes”,

ColSideColors=col.con,scale=”row”,

margins=c(12,8),srtCol=45)

dev.off()

FIGURE 5.18 Multidimensional scaling (MDS) plot.